A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

نویسندگان

Stéphane Ross

Joelle Pineau

Brahim Chaib-draa

Pierre Kreitmann

چکیده

Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Essential Benefits of the Pomdp Approach Can

T his article argues that future generations of computerbased systems will need cognitive user interfaces to achieve sufficiently robust and intelligent human interaction. These cognitive user interfaces will be characterized by the ability to support inference and reasoning, planning under uncertainty, short-term adaptation, and long-term learning from experience. An appropriate engineering fr...

متن کامل

Learning discrete Bayesian models for autonomous agent navigation

Partially observable Markov decision processes (POMDPs) are a convenient representation for reasoning and planning in mobile robot applications. We investigate two algorithms for learning POMDPs from series of observation/action pairs by comparing their performance in fourteen synthetic worlds in conjunction with four planning algorithms. Experimental results suggest that the traditional Baum-W...

متن کامل

Decision Making under Uncertainty: Operations Research Meets AI (Again)

Models for sequential decision making under uncertainty (e.g., Markov decision processes,or MDPs) have been studied in operations research for decades. The recent incorporation of ideas from many areas of AI, including planning, probabilistic modeling, machine learning, and knowledge representation) have made these models much more widely applicable. I briefly survey recent advances within AI i...

متن کامل

Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multiagent environment, extending POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents’ actions using I-POMDP, we propo...

متن کامل

Representing hierarchical POMDPs as DBNs for multi-scale map learning

We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). We use this model for representing and learning multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 12 شماره

صفحات -

تاریخ انتشار 2011

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

نویسندگان

چکیده

منابع مشابه

The Essential Benefits of the Pomdp Approach Can

Learning discrete Bayesian models for autonomous agent navigation

Decision Making under Uncertainty: Operations Research Meets AI (Again)

Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Representing hierarchical POMDPs as DBNs for multi-scale map learning

عنوان ژورنال:

اشتراک گذاری